87 research outputs found

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Hyperbolic Concentration, Anti-concentration, and Discrepancy

    Get PDF
    Chernoff bound is a fundamental tool in theoretical computer science. It has been extensively used in randomized algorithm design and stochastic type analysis. Discrepancy theory, which deals with finding a bi-coloring of a set system such that the coloring of each set is balanced, has a huge number of applications in approximation algorithms design. Chernoff bound [Che52] implies that a random bi-coloring of any set system with nn sets and nn elements will have discrepancy O(nlogn)O(\sqrt{n \log n}) with high probability, while the famous result by Spencer [Spe85] shows that there exists an O(n)O(\sqrt{n}) discrepancy solution. The study of hyperbolic polynomials dates back to the early 20th century when used to solve PDEs by G{\aa}rding [G{\aa}r59]. In recent years, more applications are found in control theory, optimization, real algebraic geometry, and so on. In particular, the breakthrough result by Marcus, Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to prove the Kadison-Singer conjecture [KS59], which is closely related to discrepancy theory. In this paper, we present a list of new results for hyperbolic polynomials: * We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher sum of arbitrary vectors and another for random vectors in the hyperbolic cone. * We show a hyperbolic anti-concentration bound. * We generalize the hyperbolic Kadison-Singer theorem [Br\"a18] for vectors in sub-isotropic position, and prove a hyperbolic Spencer theorem for any constant hyperbolic rank vectors. The classical matrix Chernoff and discrepancy results are based on determinant polynomial. To the best of our knowledge, this paper is the first work that shows either concentration or anti-concentration results for hyperbolic polynomials. We hope our findings provide more insights into hyperbolic and discrepancy theories

    Hyperbolic Concentration, Anti-Concentration, and Discrepancy

    Get PDF

    Collective modes of a collisional anisotropic quark-gluon plasma

    Full text link
    In this paper we consider the collective modes of a momentum-space anisotropic quark-gluon plasma taking into account the effect of collisions between the plasma constituents. Our analysis is carried out using a collisional kernel of Bhatnagar-Gross-Krook form and extends prior analyses in the literature by considering all possible angles of propagation of the gluonic modes relative to the momentum-anisotropy axis. We extract both the stable and unstable modes as a function of the collision rate and confirm prior findings that gluonic unstable modes can be eliminated from the spectrum if the collision rate is sufficiently large. In addition, we discuss the conditions necessary for the existence of unstable modes and present evidence that unstable mode growth rates are maximal for modes with momentum along the anisotropy direction. Finally, we demonstrate that when there is a finite collisional rate, gluonic unstable modes are absent from the spectrum at both small and large momentum anisotropy. These results pave the way for understanding the impact of collisions on a variety of non-equilibrium quark-gluon plasma observables.Comment: 19 pages and 15 figure

    Symmetric Sparse Boolean Matrix Factorization and Applications

    Get PDF
    In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given MZm×m\mathbf{M}\in\mathbb{Z}^{m\times m}, we want to find W{0,1}m×r\mathbf{W}\in\{0,1\}^{m\times r} such that MWW0\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0 is minimized among all W\mathbf{W} for which each row is kk-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: M=WW\mathbf{M} = \mathbf{W}\mathbf{W}^{\top} for W\mathbf{W} a random Boolean matrix with kk-sparse rows, and the goal is to recover W\mathbf{W} up to column permutation. Equivalently, this can be thought of as recovering a uniformly random kk-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about W\mathbf{W} and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix W\mathbf{W} has full column rank with high probability as soon as m=Ω~(r)m = \widetilde{\Omega}(r), which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.Comment: 33 pages, to appear in Innovations in Theoretical Computer Science (ITCS 2022), v2: updated ref

    Efficient Algorithm for Solving Hyperbolic Programs

    Full text link
    Hyperbolic polynomials is a class of real-roots polynomials that has wide range of applications in theoretical computer science. Each hyperbolic polynomial also induces a hyperbolic cone that is of particular interest in optimization due to its generality, as by choosing the polynomial properly, one can easily recover the classic optimization problems such as linear programming and semidefinite programming. In this work, we develop efficient algorithms for hyperbolic programming, the problem in each one wants to minimize a linear objective, under a system of linear constraints and the solution must be in the hyperbolic cone induced by the hyperbolic polynomial. Our algorithm is an instance of interior point method (IPM) that, instead of following the central path, it follows the central Swath, which is a generalization of central path. To implement the IPM efficiently, we utilize a relaxation of the hyperbolic program to a quadratic program, coupled with the first four moments of the hyperbolic eigenvalues that are crucial to update the optimization direction. We further show that, given an evaluation oracle of the polynomial, our algorithm only requires O(n2d2.5)O(n^2d^{2.5}) oracle calls, where nn is the number of variables and dd is the degree of the polynomial, with extra O((n+m)3d0.5)O((n+m)^3 d^{0.5}) arithmetic operations, where mm is the number of constraints
    corecore